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How to Claim a Discovery 
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W.A. Rolke and A.M. Lopez 
University of Puerto Rico - Mayaguez 

We describe a statistical hypothesis test for the presence of a signal. The test allows the researcher 
to fix the signal location and/or width a priori, or perform a search to find the signal region that 
maximizes the signal. The background rate and/or distribution can be known or might be estimated 
from the data. Cuts can be used to bring out the signal. 



INTRODUCTION 



Setting limits for new particles or decay modes has 
been an active research area for many years. In high 
energy physics it received renewed interest with the 
unified method by Feldman and Cousins Q. Giunti 
and Roe and Woodroofe Q gave variations of the 
unified method, trying to resolve an apparent anomaly 
when there are fewer events in the signal region than 
expected. They all discuss the problem of setting lim- 
its for the case of a known background rate. The 
case of an unknown background rate was discussed in 
a conference talk by Feldman Q and a method for 
handling this case was developed by Rolke and Lopez 
Little work has been done though on the ques- 
tion of claiming a discovery. This problem could be 
handled by finding a confidence interval and claiming 
a discovery if the lower limit is positive. Instead the 
question of a discovery should be done separately, by 
performing a hypothesis test with the null hypothesis 
iJ Q :" There is no signal present". Rejecting this hy- 
pothesis will then lead to a claim for a new discovery. 
In carrying out a hypothesis test one needs to decide 
on the type I error probability a, the probability of 
falsely rejecting the null hypothesis. This is of course 
equivalent to the major mistake to be guarded against, 
namely that of falsely claiming a discovery. 

In practice a hypothesis test is often carried out 
by finding the p-value. This is the probability that 
an identical experiment will yield a result as extreme 
(with respect to the null hypothesis) or even more so 
given that the null hypothesis is true. Then if p < a 
we reject H ; otherwise we fail to do so. For the test 
discussed here it is not possible to compute the p-value 
analytically, and therefore we will find the p-value via 
Monte Carlo. 

Maybe the most important decision in carrying out 
a hypothesis test is the choice of a, or what we might 
call the discovery threshold. As we shall sec, this de- 
cision is made much easier by the method described 
here because we will need only one threshold, regard- 
less of how the analysis was done. What a proper 
discovery threshold should be in high energy physics 
is a question outside the scope of this paper, although 
we might suggest a = 0.001 (roughly equivalent to 
3a). Sinervo Q argues for a much stricter standard 
of 5a, or a = 2.9* 10 -7 . We believe that such extreme 



values were used in the past because it was felt that 
the calculated p values were biased downward by the 
analysis process, and a small a was needed in order 
to compensate for any unwittingly introduced biases. 
If we were to trust that our p-value is in fact correct, 
a 1 in 1000 error rate should to be acceptable. 

A general introduction to hypothesis testing with 
applications to high energy physics is given in Sinervo 
|g . A classic reference for the theory of hypothesis 
testing is Lchmann 



II. THE SIGNAL TEST 

Our test uses T — x — b or T = x — yjr as the 
test statistic, depending on whether the background 
rate b is assumed to be known or not. Here x is the 
number of observations in the signal region, y is the 
number of observations in the background region and 
r is the probability that a background event falls into 
the background region divided by the probability that 
it falls into the signal region. Therefore y/r is the 
estimated background in the signal region and x — y/r 
is an estimate for the signal rate A. T is the maximum 
likelihood estimator of A, and it is the quantity used in 
Feldman and Cousins 0] without being set to when 
x — y/r is negative. This is not necessary here because 
a negative value of x — y/r will clearly lead to a failure 
to reject H - 

Other choices for the test statistic are of course pos- 
sible. For example, a measure for the size of a signal 
that is often used in high energy physics is S/ yb. Un- 
der the null hypothesis this statistic is approximately 
Gaussian, at least if there is sufficient data. Unfor- 
tunately the approximation is not sufficiently good 
in the extreme tails where a new discovery is made, 
leading to p- values that are much smaller than is war- 
ranted. Even when using Monte Carlo to compute the 
true p-value, this test statistic can be shown to be in- 
ferior to the one proposed in our method because it 
has consistently lower power, that is its probability of 
detecting a real signal is smaller. 

In order to find the p-value of the test we need to 
know the null distribution. In the simplest case of 
a known background rate and everything else fixed 
this is given by the Poisson distribution, but in all 
other cases it is not possible to compute the null dis- 
tribution analytically, and we will therefore find it via 
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Monte Carlo. As an illustration consider the follow- 
ing case shown in figure 1: here we have 100 events on 
the interval [0,1], with the signal region a priori set 
to be [0.44, 0.56]. There are 25 events in the signal re- 
gion, and the background distribution is known to be 
flat. Therefore we find x — 25, y — 75, r = 7.33 and 
T = 14.77. Because we know that the background is 
flat on [0,1], and because under the null hypothesis 
all 100 events are background we can simulate this 
experiment by drawing 100 observations from a uni- 
form distribution on [0, 1] and computing T for this 
Monte Carlo data set. Repeating this 150000 times 
we find the histogram of Monte Carlo T values shown 
in figure 2, case 1. In this simulation 8 of the 150000 
simulation runs had a value of T greater than 14.77, 
or p — 0.000053. Using a = 0.0001 we would therefore 
reject the null hypothesis and claim a discovery. Note 
that in addition to rejecting the null hypothesis we 
can also turn the p-value into a significance by using 
the Gaussian distribution and claim that this signal is 
a 3.87cr effect. 

How would things change if the signal region had 
not been fixed a priori but instead was found by 
searching through all signal regions centered at 0.5 
and we would have accepted any signal with a width 
between 0.01 and 0.2? That is if we had kept the 
signal location fixed but find the signal width that 
maximizes T, the estimate of the number of signal 
events? Again we can find the null distribution via 
Monte Carlo, repeating the exact analysis for each 
simulation run individually. The histogram of T val- 
ues for this case is shown in figure 2, case 2. Here 
we find a value of T larger than 14.77 in 570 of the 
150000 runs for a p-value of 0.0038 or 2.67a. At a 
discovery threshold of a — 0.001 we would therefore 
not find this signal significant anymore. 

Even more, what if we also let the signal location 
vary, say anywhere in [0.2, 0.8]? That is for any pair of 
values (L, H) we define [L, H] as the signal region and 
[0, L), (H, 1] as the background region, compute Tl^h 
for this pair and then maximize over all possible values 
of L and H . Note that because Tl.h is monotonically 
increasing in r as long as all the observations stay 
either in the signal or in the background region, we can 
find the maximum fairly quickly by letting L and H be 
the actual observations. The histogram of Tl,h values 
for this case is shown in figure 2, case 3. We find a 
value of T larger than 14.77 in 9750 of the 150000 runs 
for a p-value of 0.065 or 1.51er, clearly not significant. 

It was necessary in the second and third cases above 
to limit the search region somewhat, to the interval 
[0.2, 0.8] and to signals at least 0.01 and at most 0.2 
wide, because otherwise the largest value of T is al- 



most always found for a very wide signal region, even 
when a clear narrow signal is present. This restriction 
will not induce a bias as long as the decision on where 
to search are made a priori. 

In the general situation where the background is not 
flat on [0, 1] we can make use of the probability inte- 
gral transform. Of course this requires knowledge of 
the background distribution F, but if it is not known 
we can estimate it from the data, either using a para- 
metric function fitted to the data or even using a non- 
parametric density estimator. Again all calculations 
are done under the null hypothesis so we do not need 
to worry about the signal or its distribution. 

As long as we copy exactly for the Monte Carlo 
events what was done for the real data we will find 
the correct p-value. This includes using cuts used to 
improve the signal to noise ratio, but it then requires 
the ability to correctly Monte Carlo all the variables 
used for cutting, including their correlations. 



III. PERFORMANCE OF THE METHOD 

As an illustration for the performance of the signal 
test consider the following experiment: we generate 
100 events from a linear background on [3, 5] and (if 
present) a Gaussian signal at 3.9 with a width of 0.05. 
Then we find the signal through a variety of situations, 
from the one extreme where everything is fixed a priori 
to the other where the largest signal of any width is 
found. The background density is found by fitting and 
the background rate is estimated. The power curves 
are shown in figure 3. No matter what combination 
of items were fixed a priori or were used to maximize 
the test statistic, and with it the signal to noise ratio, 
all cases achieved the desired type I error probability, 
a = 0.05. Not surprisingly the more items are fixed a 
priori, the better the power of the test. 



IV. CONCLUSION 

We have described a statistical hypothesis test for 
the presence of a signal. Our test is conceptually sim- 
ple and very flexible, allowing the researcher a wide 
variety of choices during the analysis stage. It will 
yield the correct type I error probability as long as the 
Monte Carlo used to find the null distribution exactly 
mirrors the steps taken for the data. Monte Carlo 
studies have shown that this method has satisfactory 
power. 
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FIG. 1: 100 Events on [0,1], with the signal region a priori 
set to be [0.44, 0.56]. There are 25 events in the signal 
region, and the background distribution is assumed to flat. 



FIG. 2: Histograms of T values of Monte Carlo simulation. 
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FIG. 3: Power curves for 10 different cases such as sig- 
nal location fixed a priori or not, same for signal width, 
background estimated or ect. alpha=0.05 is used. 
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